I have chosen to analyze this dataset, because I want to investigate the nature of how Prosper loans are assessed for risk. I’ll first take a general look at the dataset to get a feel for a few variables of interest, then I will check for relationships between these variables.
My ultimate goal is to see what generally affects the rate and loan amount.
Most of the loans in this dataset are given to employed individuals who have a Prosper score that ranges between 4 to 8; a lower-range credit score between 660 and 720; and an upper-range credit score between 679 and 739.
summary(data$EmploymentStatus)
## Employed Full-time Not available Not employed
## 2255 67322 26355 5347 835
## Other Part-time Retired Self-employed
## 3806 1088 795 6134
summary(data$ProsperScore)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1.00 4.00 6.00 5.95 8.00 11.00 29084
summary(data$CreditScoreRangeLower)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 660.0 680.0 685.6 720.0 880.0 591
summary(data$CreditScoreRangeUpper)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 19.0 679.0 699.0 704.6 739.0 899.0 591
On average, borrowers had a debt-to-income ratio of 0.22, with 4 delinquencies within the 7 years prior to review, and 1-to-2 credit inquiries within six months prior to review.
summary(data$DebtToIncome)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.140 0.220 0.276 0.320 10.010 8554
describe(data$DebtToIncome)
## vars n mean sd median trimmed mad min max range skew kurtosis
## 1 1 105383 0.28 0.55 0.22 0.23 0.12 0 10.01 10.01 15.44 261.31
## se
## 1 0
summary(data$DelinquenciesLast7Years)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.000 0.000 4.155 3.000 99.000 990
summary(data$InquiriesLast6Months)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.000 1.000 1.435 2.000 105.000 697
summary(data$CreditScoreRangeUpper)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 19.0 679.0 699.0 704.6 739.0 899.0 591
The most common loans are $4,000 in a 36-month term and are supported by one investor each.
The average loan payment is $272.5/month, but the most common is about $173.71/month.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 131.6 217.7 272.5 371.6 2252.0
## [1] "173.71"
Most loans are taken by individuals who listed their occupation as “other” and “professional.”
Unemployed individuals also received loans. In the bivariate section, I’ll compare the loan amount given to unemployed individuals versus those who are employed.
There are 113,937 loans in the dataset with 81 variables, such as Prosper score, credit score, delinquencies in the last 7 years, open credit lines, and so on.
I am investigating the factors that affect the borrower rate and loan amount. I figure that I should first understand what factors affect the Prosper Score, so my interests are the Prosper score, credit scores, delinquencies, occupations, employment statuses, income, borrowers’ rate, APR, and original loan amount.
The other items that could help support my investigation are the revolving credit balance, the individual’s homeownership status, and the individual’s public records within the past 10 years. I assume that since people grant other people loans, then that must mean that a level of human judgement, regarding the overall trustworthiness of an individual, helps decide what amount the individual will be granted and how much the interest rate and APR will be.
Not in this section; however, it was tempting to mess with the income range variable.
I haven’t observed any unusual distributations yet.
How much of an affect does the Prosper score have on the individual’s loan application? And what can the Prosper score tell us about the lenders and the borrowers? Can the data show us its affects, if we compare it to one variable at a time?
Let’s first take a look at the potential effects of the Prosper score on the loan approval process.
Does the Prosper score affect the amount of loan given to borrowers?
It appears that there may be a pattern.
Momentarily disregarding the slight odd cases under each Prosper score,-which we could consider as a type of conditional disturbance in a potential trend,-we observe for the pattern in the highest concentrated areas:
Prosper score 1: $10,000
Prosper score 2: $15,000
Prosper score 3: $20,000
Prosper score 4: $25,000
Prosper score 5: $25,000
Prosper score 6: $25,000
Prosper score 7: $25,000
Prosper score 8: $25,000
Prosper score 9: $25,000
Prosper score 10: $25,000
There’s not that much of a pattern here. It could be that people with higher Prosper scores allow themselves to take bigger loans due to their better conditions for paying it back. But taking another look at this data may give more insight into possible underlying patterns.
Now it’s more apparent that most loans in this data are taken in $1,000 multiples, with $15,000 as the most common upper amount, and with $25,000 following. This means that we might not be able to extract much information from the loan amount, since, from score 3 and up, it’s possible to get a $35,000 loan.
Does the Prosper score have an effect on the borrower’s APR and rate?
Here is the first major sign of the Prosper score showing a relationship to the loans. The higher the Prosper score, the lower the borrower APR and rate. But of course, as it is common knowledge, we know that the better credit history an individual has, the more trustworthy he/she becomes to lenders. And the Prosper score is a risk score based on historical Prosper data. But we also know that we haven’t yet seen evidence of it being related to the credit score of the individual.
So is it safe to say that, in order to get a good, low APR and rate on a Prosper loan, then one must have a high Prosper score? Well, let’s find out what else might be affecting the borrower APR and rate.
Could the credit score be affecting them?
Yes, but something looks a bit odd about it. The patterns are not as clear and consistent as when comparing to the Prosper score. So it’s probably safe to say that the Prosper score is a better measure for what the borrower APR and rate will be like.
So then what affects the Prosper score? Does the employment status show to affect it?
Employment status mostly appears to be constant along all Prosper scores for each employment status. Other things could also be seen here, such as how there are less individuals with a Prosper score of 1, 9, and 10.
Does income have an effect on the Prosper score?
Income range does not appear to have any absolute effects on the Prosper score; although, there may be an exception with a Prosper score of 10. But it could just be that no one in this dataset with an income range of $0 has a Prosper score of 10. So, at this time, and in this view, we can rule income range out as a significant variable, until we take another look at it later.
We should pay attention to the relationship between the income range and the original loan amount. Perhaps, we should even quickly take a look to see if homeowners are treated differently here.
Well, here, we can observe something satisfying to the eye: only those who made over $100,000/year received loans over $25,000. It’s also notable that we can add this plot to the evidence for the fact that those with the unemployed status received less amounts.
Let’s briefly take a closer look just to further analyze the relevance of the Prosper score to the original loan amount.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1.000 5.000 7.000 6.738 9.000 11.000 2132
This is some good detail that was not apparent in the earlier plots. In this case, we must also observe the lower income range to see if the Prosper score proves to be different. If not, then we can rule income range out for good in my investigation.
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1.000 4.000 5.000 5.093 7.000 11.000 2620
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1.000 5.000 7.000 6.738 9.000 11.000 2132
We may now assume that earning over $100,000/year will earn about 2 points on the Prosper score, since the difference between the average of the lower incomes and the higher incomes is about almost two points; whereas, the median and 3rd quartile are exactly 2 points away. That was a slight surprise. I think that I was ready to rule income out far too early. It could also be that there are 81 variables in this dataset and I am eager to look through many of them.
Let’s see if occupation shows any trends in the Prosper score:
There seems to be nothing significant with occupation. It looks like nearly anyone with any occupation could have a high or low Prosper score. The little bit of differences seen in the plot are most likely due to the dataset not having more individuals of those occupations. So it’s probably a matter of the dataset itself, as opposed to the relationships in the data.
Are credit-seekers discriminated against in the Prosper scoring system?
It doesn’t look like a significant difference. It looks a bit wavy and rules the idea out from this perspective.
How about delinquencies in the past 7 years?
The curve in this graph renders that idea useless, as one would easily expect that the number of delinquencies within a 7-year period would continuously decrease along the Prosper scores, from lowest to highest. The reasoning is that those who were able to obtain a higher Prosper score were those who were able to make their payments on time throughout the years; however, that does not seem to be the case here.
Now let’s go back to employment status, since we found something interesting earlier. The unemployed individuals in the dataset have loans anywhere between $1,000 to $25,000. Most are $4,000.
It would be a good idea to see what their credit is like. They could have been unemployed for any reason.
Does the credit score have an effect on the Prosper score?
Not significantly enough. If someone with a credit score of ~630 could have a Prosper score of 10 (the highest), and if someone with a credit score of ~840 could have a Prosper score of 1 (the lowest), then I want to move on to the variables that are more promising.
Unemployed individuals received loans of lesser amounts than employed individuals (perhaps they asked for less), but they, for the most part, have similar credit scores compared to the employed individuals.
Regardless, the employed individuals, overall, have higher Prosper scores than the unemployed individuals, so it could be that the unemployed individuals are granted lesser amounts due to the lenders having less confidence in the loan being paid off.
Another important factor is that Individuals with credit scores between 600 and 800+ could all have a Prosper score of 1. In fact, even individuals with credit scores of 600 could have a Prosper score of 10.
Those with more credit inquiries and more delinquencies over the prior 7 years did not show to have significant penalties in their Prosper score rating; however, the score could have been compensated in other ways.
It seems that higher income results in obtaining a higher Prosper score and having the option of higher loan amounts. Another fact that could support this is that those who reported to be unemployed had lesser loan amounts, even though they appeared to have higher credit scores (not Prosper scores). So the amount of income might be a strong relationship to risk assessment within the Prosper community.
## ProsperScore Term CreditScoreRangeLower CreditScoreRangeUpper
## 113932 6 60 800 819
## 113933 5 36 700 719
## 113934 8 36 700 719
## 113935 3 60 700 719
## 113936 5 60 680 699
## 113937 7 36 680 699
## InquiriesLast6Months DelinquenciesLast7Years RevolvingCreditBalance
## 113932 2 0 566
## 113933 0 7 7714
## 113934 1 4 15743
## 113935 1 0 22147
## 113936 1 0 11956
## 113937 0 3 6166
## ProsperScore Term CreditScoreRangeLower CreditScoreRangeUpper
## 1 NA 36 640 659
## 2 7 36 680 699
## 3 NA 36 480 499
## 4 9 36 800 819
## 5 4 36 680 699
## 6 10 60 740 759
## InquiriesLast6Months DelinquenciesLast7Years RevolvingCreditBalance
## 1 3 4 0
## 2 3 0 3989
## 3 0 0 NA
## 4 0 14 1444
## 5 1 0 6193
## 6 0 0 62999
Individuals with a Prosper score between 1 and 3 mostly have less original loan amounts than those with a score between 4 and 10. The only significant advantage in having a Prosper score higher than 4, 5, and 6 is that those who are ‘not employed,’ or have an employment status of ‘other,’ receive higher original loan amounts.
It’s apparent that the rate decreases as the loan amount increases, and that the higher loan amounts are given to those who have a high Prosper score.
The Prosper score seems to show a significant effect on the amount of loan granted when it is between 1 and 6, but it seems to have a stagnant effect on the loan amount when it is 6 or more. At which point, other matters show more of an effect, such as employment status.
Self-employed individuals appear to be trusted just as much as the non-self-employed individuals. At first, one may assume that self-employed individuals could be seen as ones who face higher risks and might be judged to have less income security.
The median, Q1, and Q3 of the borrower rate decreases as the Prosper score increases. The decrease in the median and Q3 of the borrower rate is discontinued for Prosper score 5 in this dataset, but it’s continued again from score 6 to 10. Other factors may have affected the borrower rate for those with a Prosper score of 5 in this dataset.
As the income range increases, so does the median loan amount, along with the first and third quartiles. This trend is most likely due to the availability of higher loan amounts with respect to higher income ranges.
The $0 plot appears similar to the $25,000-49,999 and may be used as a way to not provide income information. It could be that when the income information is not provided, the individual is assumed to have the same level of risk as someone who has an income range of $25,000-49,999.
The borrower rate decreases as the original loan amount increases. It’s also evident that those with the higher Prosper scores are mostly awarded lower rates. For instance, loans that are above $25,000 are given at a rate below 0.2, and loans below $15,000 are given at rates between 0.05 and 0.35.
This dataset shows that many factors go into assessing risk before a Prosper loan is granted, since the Prosper score seems to be affected by other variables that I have yet to analyze. But, ultimately, the variables that are most relative to the outcome of the borrower rate and loan amount are the individual’s income range and Prosper score. Through analysis, it becomes apparent that the Prosper score also increases by about 2 points for those who have an income range of over six figures. So, overall, it’s safe to conclude that the income range is the biggest factor in risk assessment for Prosper loans. However, other factors also affect the Prosper score, such as employment status and ones that I have not analyzed. In order to better understand the Prosper score, the rest of the 81 variables should be compared to it.
One of the biggest challenges that I faced in this analysis was computing power. I had to work on the desktop to have access to the beefier hardware (an i7), because it takes a long time to get these plots up for analysis and verification. I eventually started using NoMachine to regain mobility. So I have been accessing the desktop with a laptop.
My conclusion is that employed individuals with high income ranges are granted the lowest rates and the highest loan amounts. I suppose that, from the lenders’ end, this goes hand-in-hand with the good ole market ideology of “low risk, low gain/loss; high risk, high gain/loss.” This is some good knowledge to have, but the most valuable part of this analysis is that it helped me become familiar with analyzing loans and risk models. At least, next time, I’ll have a better intuition for where to begin and what type of questions to ask.